PMI Matrix Approximations with Applications to Neural Language Modeling
نویسندگان
چکیده
The negative sampling (NEG) objective function, used in word2vec, is a simplification of the Noise Contrastive Estimation (NCE) method. NEG was found to be highly effective in learning continuous word representations. However, unlike NCE, it was considered inapplicable for the purpose of learning the parameters of a language model. In this study, we refute this assertion by providing a principled derivation for NEG-based language modeling, founded on a novel analysis of a low-dimensional approximation of the matrix of pointwise mutual information between the contexts and the predicted words. The obtained language modeling is closely related to NCE language models but is based on a simplified objective function. We thus provide a unified formulation for two main language processing tasks, namely word embedding and language modeling, based on the NEG objective function. Experimental results on two popular language modeling benchmarks show comparable perplexity results, with a small advantage to NEG over NCE.
منابع مشابه
A Simple Language Model based on PMI Matrix Approximations
In this study, we introduce a new approach for learning language models by training them to estimate word-context pointwise mutual information (PMI), and then deriving the desired conditional probabilities from PMI at test time. Specifically, we show that with minor modifications to word2vec’s algorithm, we get principled language models that are closely related to the well-established Noise Co...
متن کاملRandom walks on discourse spaces: a new generative language model with applications to semantic word embeddings
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods such as Latent Semantic Analysis (LSA), generative text models such as topic models, matrix factorization, neural nets, and energy-based models. Many methods use nonlinear operations —such as Pairwise Mutual Information or PMI— on co-occurrence statistics, and have hand-tuned hyperparameter...
متن کاملGeneralized Nonnegative Matrix Approximations with Bregman Divergences
Nonnegative matrix approximation (NNMA) is a recent technique for dimensionality reduction and data analysis that yields a parts based, sparse nonnegative representation for nonnegative input data. NNMA has found a wide variety of applications, including text analysis, document clustering, face/image recognition, language modeling, speech processing and many others. Despite these numerous appli...
متن کاملNeuro-ACT Cognitive Architecture Applications in Modeling Driver’s Steering Behavior in Turns
Cognitive Architectures (CAs) are the core of artificial cognitive systems. A CA is supposed to specify the human brain at a level of abstraction suitable for explaining how it achieves the functions of the mind. Over the years a number of distinct CAs have been proposed by different authors and their limitations and potentials were investigated. These CAs are usually classified as symbolic and...
متن کاملAuto-Sizing Neural Networks: With Applications to n-gram Language Models
Neural networks have been shown to improve performance across a range of natural-language tasks. However, designing and training them can be complicated. Frequently, researchers resort to repeated experimentation to pick optimal settings. In this paper, we address the issue of choosing the correct number of units in hidden layers. We introduce a method for automatically adjusting network size b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1609.01235 شماره
صفحات -
تاریخ انتشار 2016